KMID : 1022420150070010003
|
|
Phonetics and Speech Sciences 2015 Volume.7 No. 1 p.3 ~ p.10
|
|
Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition
|
|
Choi Young-Ho
Ban Sung-Min Kim Kyung-Wha Kim Hyung-Soon
|
|
Abstract
|
|
|
In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and
bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and
delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the
Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.
|
|
KEYWORD
|
|
speaker recognition, GMM-UBM, JFA, MFCC, LFCC, BWFCC, delta feature, cepstral-time matrix
|
|
FullTexts / Linksout information
|
|
|
|
Listed journal information
|
|
|
|